Back

Developmental Biology

Elsevier BV

Preprints posted in the last 7 days, ranked by how well they match Developmental Biology's content profile, based on 134 papers previously published here. The average preprint has a 0.14% match score for this journal, so anything above that is already an above-average fit.

1
Keeping human in the loop: A three-phase generative AI workflow for research integrity in data-intensive science.A methodological case study using elite Ethiopian distance-running data

Galko, P.; Yisamaw, A.; Haugen, T.; Seiler, S.

2026-05-29 sports medicine 10.64898/2026.05.29.26354013 medRxiv
Top 4%
0.3%
Show abstract

Background: Generative AI tools can support data-intensive research by writing code, drafting prose, searching analytical possibilities, and stress-testing claims. They can also produce false citations, drift between statistical specifications, and lose continuity across long investigations. This paper describes a practical workflow for using AI systems in empirical research while keeping discovery, verification, and accountability inspectable. Methods: We developed and applied a three-phase human-AI workflow to a case study of 14 elite Ethiopian distance runners. The dataset contained 22,605 GPS-segments collected across 97 consecutive days in late 2025, supplemented by venue and athlete metadata collected in the field. Phase 1 used an autonomous data-exploration tool to pre-filter the hypothesis space across five seeded research questions. Phase 2 used an AI system under direct human guidance to construct candidate findings into numerical claims, verification scripts, and draft text. Phase 3 used an independent AI system in an adversarial role to stress-test methods, statistics, prose, figures, and citations. The workflow was informed by Pearl's distinction between association, intervention, and counterfactual reasoning, with human judgement retained for research direction, interpretation, and final claims. Results: The workflow produced three empirical analyses and a documented correction process. The analyses estimated an altitude-to-sea-level pace correction of +0.10 min/km per 1,000 m at matched heart rate, showed why pooled altitude-surface regression was not identifiable within this venue system, documented method-dependence in heart-rate-based intensity classification, characterised within-venue route variation as a 64/36 path-fixed-to-trail-variable split with the Sululta label resolving into two functionally distinct sub-venues, and reframed the cohort's training through a 3x3x3 prescription lattice grounded in Ethiopian coaching practice. The adversarial phase identified several hallucinated citations, a terminology error between HC1 and cluster-robust standard errors, and several inconsistencies between prose, figures, and computed results. Verification scripts re-derived nearly all numerical claims from the cleaned lap-level data. Conclusions: The case study shows how researchers can organise AI-assisted empirical work so that candidate discovery, claim construction, independent stress-testing, and final accountability remain separated. The workflow did not remove the need for domain expertise or human judgement. Its value was in making the route from candidate finding to manuscript claim explicit, reproducible, and open to challenge. Trial registration: Not applicable.

2
Dentine markers of pre/early postnatal lead exposure links with brain, cognitive, and behavioral outcomes in adolescents

Marshall, A. T.; Kan, E.; Adise, S.; König, M.; McConnell, R.; Martinez, M.; Midya, V.; Arora, M.; Sowell, E. R.

2026-05-27 pediatrics 10.64898/2026.05.26.26354134 medRxiv
Top 7%
0.2%
Show abstract

Lead is a toxic metal ubiquitous in our environment. While dramatic reductions in lead sources have paralleled equivalent decreases in lead-poisoning rates, chronic lead exposure remains a critical public health concern. Childhood lead exposure (at its lowest levels) is liked to changes in cognitive development but less is known about lead's effects on children's brain structure, especially as a result of in utero exposure. We measured prenatal and early-postnatal lead exposure in shed deciduous teeth of 448 9- and 10-year-old children (from 20 United States cities) and linked those lead levels to childhood brain structure, cognition/behavior, and neighborhood- and family-level socioeconomic characteristics. Here we show negative associations between tooth-lead levels and the thickness of the brain's cortex, particularly in regions linked to language processing. With increasing tooth-lead levels, children of lower-income (versus higher-income) families showed steeper declines in receptive vocabulary. Caregiver-reported behavioral problems exhibited similar associations. With in utero exposure linked to adverse neurodevelopmental outcomes (well before lead exposure and its risks are evaluated by healthcare professionals), prenatal screening of maternal lead levels/exposure, coupled with recommended strategies to reduce its placental transmission, may help reduce lead's effects on future generations.

3
Application of SinoPlan in Trajectory Planning for Robot-Assisted Intracerebral Hematoma Puncture

Zhang, F. y.; Yao, J.; Zhou, Q. y.; fang, Y. c.; Hu, A.; Wang, Y.; Ding, W.; Wu, X.; Gu, Y.

2026-05-27 surgery 10.64898/2026.05.24.26353998 medRxiv
Top 8%
0.1%
Show abstract

Robot-assisted hematoma puncture has seen significant development in primary hospitals across the country. Sino Plan software system is the core of the intelligent surgical robot, independently developed by Sinovation.We conducted a comparative study of imaging indicators, such as residual hematoma volume and hematoma clearance rate, as well as prognostic indicators, in patients who underwent hematoma puncture at our hospital over a 9-year period, before and after the introduction of Sino Plan.The results indicated that following the application of Sino Plan, the hematoma clearance rate was significantly enhanced, and the residual hematoma volume was markedly reduced. Regarding patient prognosis, there was no significant difference in GCS scores between the two groups, but the incidence of adverse prognostic events was lower in patients where Sino Plan was utilized.In conclusion, this 9-year retrospective analysis at our hospital reveals that Sino Plan offers distinct advantages. However, its application in certain special cases suggests that further improvements to the software are warranted to better meet the demands of more specific clinical scenarios.

4
DISCERN: A Clinical Impact-aware Framework for Radiology Report Comparison

Sharma, R.; Beeche, C.; Dong, J.; Zhuang, R.; Qu, H.; Zhang, R.; Gangaram, V.; Goswami, P.; Xin, J.; Ballard, J.; Goldberg, A.; Sagreiya, H.; Long, Q.; Chen, T.; Witschey, W. R.

2026-05-27 radiology and imaging 10.64898/2026.05.26.26353612 medRxiv
Top 9%
0.1%
Show abstract

The surge in medical imaging has spurred the development of vision-language models (VLMs) to alleviate radiologist workloads. However, clinical deployment is hindered by the lack of meaningful evaluation frameworks. Current metrics - ranging from semantic similarity to large language model (LLM) based judges - often fail to distinguish between clinically trivial and critical discrepancies, poorly reflecting real-world clinical judgment. To address this, we introduce DISCERN (Discordance and Significance-aware Entity-level Radiology Report Comparison). DISCERN is a significance-aware framework that weighs report errors based on their potential impact on patient care. Our results demonstrate that DISCERN powered by closed source LLMs aligns more closely with expert radiologist assessments than traditional metrics or current LLM evaluators, providing a more interpretable and clinically relevant benchmark. By modeling radiologist prioritization and entity-level feedback, DISCERN facilitates targeted model refinement and ensures the safer integration of generative AI into clinical workflows.

5
Using artificial intelligence for radiotherapy clinical trial quality assurance: analysis of a multi-institutional clinical trial for neurovascular-sparing prostate stereotactic ablative radiotherapy

Doucette, M.; Zhang, Y.; Liao, C.-Y.; Lin, M.-H.; Yan, Y.; Dess, R. T.; Tendulkar, R. D.; Garant, A.; Hannan, R.; Jiang, S.; Nguyen, D.; Desai, N.; Yang, D. X.

2026-05-29 health informatics 10.64898/2026.05.27.26354252 medRxiv
Top 9%
0.1%
Show abstract

Our study evaluated whether a deep learning auto segmentation model combined with machine learning triage can streamline radiotherapy clinical trial quality assurance (QA). We analyzed 107 stereotactic ablative radiotherapy (SABR) cases from a multi-institutional phase II clinical trial of neurovascular sparing prostate SABR, focusing on physician contours of the internal pudendal artery (IPA) as a novel organ-at-risk with substantial interobserver variability. Contours were scored by the trial principal investigator as Per-Protocol or Minor Deviation/Unacceptable. We applied a deep learning model for IPA auto-segmentation. Agreement between human and AI contours was then quantified using 14 overlap, distance, and surface metrics, and a supervised classifier was trained on these metrics to flag clinical trial protocol deviations. While AI segmentation achieved only modest geometric accuracy with mean Dice similarity coefficient of 0.446 and 95th percentile Hausdorff distance of 14.23, when incorporating all 14 metrics, a machine learning classifier yielded AUROC of 0.836, flagging all Minor Deviation/Unacceptable cases with 100% sensitivity on the 27 case hold-out set with 6 false positives and no false negatives. AI segmentation combined with metrics-based machine learning can triage protocol deviations within a multi-institution radiotherapy clinical trial, supporting prospective evaluation of AI-assisted trial QA.

6
Phenome-Wide Association Study of Pre-Cancer Diagnosis Electronic Health Records Identifies Risk and Inverse Associations in the All of Us Research Program

Rich, C. C. D.; Bang, E. J.; Bair, A. B.; Richardson, B. E.; Millington, J. L.; Bates, B. A.; Davis, M. F.; Bailey, M. H.

2026-05-28 health informatics 10.64898/2026.05.26.26353823 medRxiv
Top 9%
0.1%
Show abstract

Background: The All of Us Research Program represents a rich resource for cancer epidemiology research, with over 400,000 participants with whole genome sequences linked to electronic health records (EHR). Large cancer datasets often focus exclusively on cases without controls and neglect pre-diagnosis healthcare occurrences. Here, we perform a phenome-wide association study (PheWAS) of EHR data at least 1 year pre-diagnosis between cancer cases and matched controls, revealing co-occurring and mutually exclusive phenotypes. Methods: We identified 55,000+ cancer cases across 21 cancer types in All of Us version 8. To eliminate age-related confounding, we implemented a two-stage matching and censoring strategy: loose matching on demographics to establish index dates and cohort comparability, followed by right-censoring of EHR data (excluding 1 year pre-diagnosis/index), then 1:2 matching to address residual demographic imbalance. We tested associations between 23,193 cancer cases, 46,386 matched controls and approximately 1,600 clinical phenotypes using logistic regression adjusted for sex at birth, self-reported race, age at diagnosis/index date, and two censored EHR metrics: observation window and unique condition count, with Bonferroni correction for multiple testing. Results: Our analysis identified 232 significantly associated phenotypes, confirming established cancer risk factors including elevated prostate specific antigen (OR = 2.92, 95% CI: 2.65-3.23; p-value=1.8x10-101) and multinodular goiter (OR = 1.73, 95% CI: 1.56-1.91; p-value=6.7x10-27). Further investigation into the relationship between several phenotypes with seeming inverse effects is warranted. Conclusions: This PheWAS of EHR data at least 1 year pre-diagnosis leveraged the diversity of All of Us to examine how clinical phenotypes prior to cancer diagnosis vary across cancer types and racial groups. Our findings validate All of Us as a robust platform for cancer epidemiology research, confirming established risk factors at scale across diverse populations. This work provides methodological insights for EHR-based susceptibility analyses and demonstrates the value of agnostic phenome-wide approaches for generating hypotheses in precision medicine.

7
Surgical outcomes in complicated appendicitis: does timing or surgeon seniority matter? A propensity score-matched analysis from the RIFT Turkey cohort

Yalcinkaya, A.; Demirli Atici, S.; Ozen, C.; Karasoy, D.; Kamer, E.; Yalcinkaya, A.; Leventoglu, S.; RIFT Turkey Study Collaborators,

2026-05-26 surgery 10.64898/2026.05.19.26353556 medRxiv
Top 12%
0.0%
Show abstract

Background: Complicated acute appendicitis carries a higher risk of postoperative morbidity relative to uncomplicated cases. It remains unclear whether surgical timing (night vs. day; weekend vs. weekday) or surgeon seniority influence short-term outcomes in this high-risk population. Methods: This was a retrospective analysis of the RIFT Turkey dataset restricted to histologically confirmed cases of complicated appendicitis who had undergone laparoscopic appendectomy. Primary exposures were surgical timing (day [n=92] vs. night [n=123]; weekday [n=172] vs. weekend [n=43]) and surgeon seniority (trainee [n=89] vs. consultant [n=126]). The primary outcome was unplanned readmission and/or reintervention within 60 days. Secondary outcomes were conversion to open surgery and length of stay (LOS) >3 days. Propensity score matching (PSM) using RIPASA score (caliper 0.05, SMD <0.1) was performed as a pre-specified sensitivity analysis for each comparison. Results: Night-time surgery was associated with higher frequencies of unplanned readmission / reintervention (12.2% vs. 6.5%; OR 1.99 [95% CI 0.74-5.35], p=0.166) and surgical conversion (9.8% vs. 3.3%; OR 3.21 [0.88-11.72], p=0.064) compared with daytime surgery, neither reaching significance. Trainee surgeons had significantly higher readmission/reintervention rates than consultants (15.7% vs. 5.6%; OR 0.32 [0.12-0.82], p=0.013). PSM-adjusted results also showed similar relationships: night vs. day (readmission OR 2.45 [0.85-7.03], p=0.09; conversion OR 2.84 [0.73-11.1], p=0.13), weekend vs. weekday (readmission OR 1.53 [0.24-9.72], p=0.65), and trainee vs. consultant (readmission OR 0.25 [0.08-0.79], p=0.013). Conclusion: Surgical timing was not significantly associated with short-term outcomes in complicated appendicitis, though night-time surgery showed a consistent trend towards higher complication rates. Surgeon seniority was the only factor independently and significantly associated with unplanned readmission and reintervention in both primary and PSM analyses, indicating the need for senior supervision during out-of-hours procedures. Keywords: complicated appendicitis; surgical timing; night surgery; weekend effect; surgeon seniority; propensity score matching; RIFT Turkey

8
Cancer Prevalence and Patterns in Kilifi County: A 10-year Retrospective Descriptive Study

Masha, M.; Mbugua, R. W.; Abdullahi, M.; Sheikh, N. A.; Omar, A.; Abdihamid, O.

2026-06-01 oncology 10.64898/2026.05.20.26353643 medRxiv
Top 12%
0.0%
Show abstract

Abstract Background Cancer is an increasing public health challenge in Kenya, particularly in rural and underserved regions where surveillance systems and diagnostic capacity remain limited. Kilifi County, located along the Kenyan coast, lacks a population-based cancer registry, and data on the local cancer burden is not available. This study aimed to characterize the demographic distribution of patients, cancer burden in the county, and management of cancer cases diagnosed at Kilifi County Referral Hospital (KCRH) over ten years. Methods This retrospective study analyzed the patterns of cancer in Kilifi County using patient records from KCRH during the study period (January 1, 2014, to January 1, 2024). Results A total of 101 patients with cancer were identified, 58% female, with a mean age of 54 years. Most patients were from Kilifi North (47%), with a high proportion reporting no formal occupation (41%) or farming (26%). Esophageal and cervical cancers were the most common (18% each), followed by breast and prostate cancers (5% each), with other malignancies occurring infrequently. Histopathology was the primary diagnostic modality (88%). Staging data were incomplete in 70% of cases; among documented cases, the majority presented with advanced disease (21% stage IV). Due to limited local treatment capacity, approximately half of the patients were referred to tertiary centers for chemotherapy, radiotherapy, or surgery. At data cut-off, 43% had died, 25% were on treatment, and 29% were lost to follow-up, with only 2% completing treatment or under follow-up. Conclusions This study demonstrates a substantial cancer burden in Kilifi County and highlights critical gaps in diagnostic capacity, staging, and continuity of care. Strengthening cancer surveillance systems, expanding diagnostic and treatment infrastructure, and establishing a population-based cancer registry are essential to improving cancer outcomes and advancing equitable care in rural Kenya

9
Association of Clonal Hematopoiesis with Total and Cause-Specific Mortality Among Older Women

Chang, A.; Ezzat, D.; Uddin, M. M.; Pershad, Y.; Collins, J. M.; Kitzman, J.; Jaiswal, S.; Desai, P.; Shadyab, A.; Anderson, G. L.; Casanova, R.; Wallace, R.; Wactawski-Wende, J.; Bick, A. G.; Natarajan, P.; Kooperberg, C.; LaMonte, M. J.; Whitsel, E. A.; Manson, J. E.; Reiner, A. P.; Honigberg, M. C.

2026-06-01 cardiovascular medicine 10.64898/2026.05.28.26354392 medRxiv
Top 13%
0.0%
Show abstract

Clonal hematopoiesis of indeterminate potential (CHIP) represents the age-related expansion of hematopoietic stem cells with preleukemic mutations. However, its association with all-cause and cause-specific mortality has not been well characterized in older adults. We aimed to evaluate whether CHIP is associated with all-cause and cause-specific mortality in a population of older women in the United States. Our study included 6,704 participants in the Women?s Health Initiative Long Life Study (WHI-LLS) without hematologic malignancy. The co-primary exposures were any CHIP (variant allele frequency [VAF] [&ge;] 2%) and large CHIP (VAF [&ge;] 10%), and the primary outcome was all-cause mortality. Multivariable-adjusted Cox proportional hazards models tested the associations of CHIP and CHIP subtypes with all-cause and cause-specific mortality. Any CHIP and large CHIP were independently associated with all-cause mortality, with multivariable-adjusted hazard ratios (aHRs) of 1.12 (95% confidence interval [CI] 1.04-1.21; P = 0.003) and 1.28 (95% CI 1.15-1.43; P < 0.001), respectively. In gene-specific analyses, non-DNMT3A CHIP was associated with all-cause mortality (aHR: 1.22 [95% CI: 1.12-1.34], P < 0.001), while DNMT3A CHIP was not (aHR: 1.07 [95% CI: 0.98-1.18], P = 0.13). Furthermore, large CHIP was associated with cardiovascular (aHR: 1.29 [95% CI: 1.08-1.55], P = 0.006), cancer (aHR: 1.49 [95% CI: 1.11-2.02], P = 0.009), and neurologic (aHR: 1.40 [95% CI: 1.07-1.84], P = 0.02) death. In this cohort of older women, CHIP, particularly large clones and non-DNMT3A CHIP, was associated with all-cause and cause-specific mortality. These findings suggest that clonal size and subtype may differentially influence mortality risk.

10
Mid-Pregnancy Maternal Leukocyte Telomere Length and Preterm Birth in a Population-Based Hispanic/Latina California Cohort

Garay, O.; Oltman, S.; Bear, R. J.; Lin, J.; Wojcicki, J. M.; Ryckman, K. K.; Jelliffe-Pawlowski, L. L.

2026-05-30 genetic and genomic medicine 10.64898/2026.05.27.26354189 medRxiv
Top 13%
0.0%
Show abstract

Background Preterm birth (PTB) rates among Hispanic/Latina individuals in the United States have risen over the past decade. Data suggests this rise may be driven in part by psychosocial stress. Leukocyte telomere length (LTL), a marker of cumulative cellular aging that shortens under chronic stress, may capture stress-related biological vulnerability, but has not been examined as a potential population-level contributor to PTB in Hispanic/Latina pregnancies. Objective To examine the association between mid-pregnancy maternal LTL and PTB in a population-based Hispanic/Latina cohort. Methods In a case-control study nested within a California singleton birth cohort (n = 436 Hispanic/Latina individuals; 215 PTB, 221 term births), LTL was measured by quantitative PCR from biobank specimens collected from 15 to 20 weeks of gestation. Covariates from linked birth certificate and hospital discharge records were included. Logistic regression estimated ORs and 95% CIs of PTB by LTL examined continuously and by percentile category (<=10th, 11th-89th, >=90th) with and without adjustment for covariates. Results Mean and median LTL did not differ between PTB and term births. LTL at or below the 10th percentile was associated with elevated odds of PTB relative to full-term birth (12.6% versus 4.3%; ORc = 3.2, 95% CI 1.3-7.9), persisting after partial (ORadj1 = 3.2, 95% CI 1.3-8.3) and full covariate adjustment (ORadj2 = 3.4, 95% CI 1.3-9.3). Subgroup analyses showed consistent directional patterns across PTB subgroups and for early term birth (ORadj2 = 5.1, 95% CI 1.5-17.0). Conclusions Mid-pregnancy maternal LTL <=10th percentile was associated with more than three times the odds of PTB, with risk concentrated at the extreme low tail of the distribution. Consistent with a cumulative allostatic load model, markedly short LTL at mid-gestation may reflect elevated stress-related biological risk for preterm delivery. These findings support upstream investment in stress reduction and prospective LTL research in high-burden populations.

11
The Verification Gap: Artificial Intelligence Adoption, Hallucination Awareness, and Verification Practices Among Early Career Medical Researchers in Pakistan

Sajjad, M.

2026-05-30 health informatics 10.64898/2026.05.28.26354373 medRxiv
Top 13%
0.0%
Show abstract

Artificial intelligence (AI) tools have been rapidly adopted by medical researchers, yet whether early career researchers in low and middle income countries possess the awareness and habits needed to use these tools safely remains poorly documented. This study characterized AI adoption patterns, hallucination awareness, and verification and disclosure practices among early career medical researchers in Pakistan. A cross sectional anonymous online survey was conducted among medical students, house officers, residents, physicians, and faculty involved in research or academic work across Pakistan (May 2026). Descriptive statistics and chi square tests were applied to 373 eligible responses. AI use was near universal (99.7%), with 60.3% using AI tools daily. The most commonly reported tool in this sample was Claude (40.5%), followed by ChatGPT (29.2%) and Perplexity (26.0%), though this ranking likely reflects sampling characteristics. Despite high adoption, 59.2% typically did not verify AI outputs before use, and 40.2% had never heard that AI can generate fabricated scientific references. In behavioral vignettes, 36.5% assumed convincing AI generated references were authentic, and 54.2% would continue using remaining AI content after discovering one fabricated reference. Formal research training was strongly associated with consistent disclosure (51.7% vs. 17.1%; chi square=48.43, p less than 0.001). Role, daily use frequency, and research training were not significantly associated with verification behavior. Early career medical researchers in Pakistan demonstrate high AI adoption alongside incomplete hallucination awareness and infrequent verification, a pattern that may carry implications for research integrity. Formal training was the only factor significantly associated with consistent disclosure. Integration of AI literacy into medical curricula and institutional governance frameworks merits consideration.

12
Cleaner Air for Lower Cardiometabolic Risk: protocol for a double-blind, randomized, sham-controlled trial of HEPA filtration in adults with prediabetes.

Wittkopp, S.; Asachi, P.; Kazatsker, F.; Aleman, J. O.; Gordon, T.; Brook, R.; Thorpe, L.; Newman, J. D.

2026-06-01 endocrinology 10.64898/2026.05.29.26354420 medRxiv
Top 13%
0.0%
Show abstract

Introduction Air pollution is a leading driver of cardiovascular disease with a growing body of literature implicating this in worse glucose homeostasis. Increases in fine particulate matter air pollution (PM2.5) are associated with increased blood glucose and hemoglobin A1c across the glycemic spectrum from normoglycemia to prediabetes to all forms of diabetes. Despite strong evidence for positive associations of PM2.5 with dysglycemia, it remains unknown if reducing air pollution exposure through air filtration can effect improvements in glucose. This study aims to test the hypothesis that short-term, in-home air pollution reduction using high efficiency particulate air (HEPA) filtration will improve blood sugar in adults with prediabetes. Methods and analysis This trial is a randomized, double-blind, sham-controlled trial of the effects of lowering air pollution exposure using HEPA filtration on cardiometabolic health in adults with prediabetes living in the New York City area. Participants will be randomly assigned to use bedroom air cleaners, or sham air cleaners, while measuring PM2.5 continuously for 1 month. The primary outcomes will be continuous glucose monitoring metrics measured before and after HEPA air filtration. Exploratory outcomes will include insulin resistance measures, serum biomarkers and transcriptomics measured before and after HEPA intervention. We will quantify effects of HEPA filtration with models using treatment arm (true versus sham filtration) as the independent variable. Secondary analyses will model continuous measures of PM2.5 as the independent variable. Ethics and Dissemination This study has undergone peer review; and the work was supported by Grant 2023-0214 from the Doris Duke Foundation, who had no other role in study design or implementation. The study was registered in ClinicalTrials.gov (NCT05994937) prior to recruitment. Clinical Trials Clinical Trials NCT05994937; https://clinicaltrials.gov/study/NCT05994937

13
TopBrain Segmentation Challenge for Whole Brain Vessel Anatomy

Yang, K.; Shi, P.; Huang, H.; Musio, F.; Baazaoui, H.; Aydin, O. U.; Hilbert, A.; Hamadache, R. E.; Yalcin, C.; Zhang, M.; Falcetta, D.; de la Rosa, E.; Shit, S.; Prabhakar, C.; Wittmann, B.; Rokuss, M. R.; Kirchhoff, Y.; Al-Maskari, R.; Hoeher, L.; Juchler, N.; Casamitjana, A.; Cleary, J.; Schmick, A.; Baumgartner, P.; Deseoe, J.; Vandans, O.; Lee, D.; Oh, K.; LaBella, D.; Mazher, M.; Niederer, S. A.; Qayyum, A.; Liu, Y.; Chen, J.; Kim, W.; Asawalertsak, N.; Kim, M.; Shin, D.; Park, S.-H.; Kikuchi, S.; Zhang, Y.; Liu, J.; Cui, Y.; Qiu, Y.; Verschuur, A.; Zhang, J.; van der Schaaf, I.; Su, R.;

2026-05-30 radiology and imaging 10.64898/2026.05.28.26354312 medRxiv
Top 14%
0.0%
Show abstract

We present the TopBrain 2025 Challenge, the first benchmark for fine-grained multiclass segmentation of the whole brain vasculature in both computed tomography angiography (CTA) and magnetic resonance angiography (MRA). Building on the TopCoW challenge, TopBrain scales vessel annotation from the Circle of Willis to the entire brain, introducing a dataset of 90 annotated volumes across 48 landmark vessel classes spanning arterial and venous systems, of which 50 training volumes are publicly released. Vessel definitions were consolidated from established neuroanatomical references into a unified annotation scheme, and vessel caliber measurements along the centerline are reported for the first time across the whole brain vascular anatomy. To address the unique challenges of multiclass brain vessel segmentation, we propose an evaluation framework that accounts for detection in segmentation performance, assesses anatomical plausibility, and introduces novel contamination metrics that characterize inter-class prediction errors. Fifteen teams from over 220 registered participants submitted algorithms to the benchmark. The top-performing teams built on nnUNet with principled system design choices, achieving around 80% Dice scores, near-zero invalid neighbor counts, over 60% F1 scores for side-road vessels, and below 18% foreground contamination ratio. Larger vessels are easier to segment, while smaller and more complex vessels remain the true bottleneck. The annotated datasets and podium-finish algorithms are made publicly available on Zenodo.

14
Personalized Brain-Based Analgesia Detection with Portable fNIRS and AI

Minoccheri, C.; Joo, P.; Hu, X.-S.; Affendi, H.; Elayyan, F.; Harville, A.; McDonald, N. J.; Botero, T.; DaSilva, A. F.

2026-05-28 dentistry and oral medicine 10.64898/2026.05.20.26353377 medRxiv
Top 16%
0.0%
Show abstract

Neuroimaging based pain decoding faces two underappreciated challenges: between subject variability that prevents classifiers from generalizing across patients, and within session cross validation designs that inflate reported accuracy by conflating within person and between person variance. Here we address both using portable functional near infrared spectroscopy (fNIRS) during pharmacologically verified local nerve anesthesia. Twentyfive patients with clinically painful teeth underwent 36 channel bilateral fNIRS during percussion before ("Pre") and after ("Post") local nerve anesthesia. In 13 block-success patients, a paired Pre versus Post comparison with healthy tooth control identified three temporal hemodynamic response function (HRF) features (late slope, mean first derivative, and baseline normalized amplitude) whose analgesia interaction effects (d = 0.63 to 0.79) exceeded that of raw general linear model (GLM) amplitude (d = 0.56), with a significant difference-in-differences interaction (p = 0.011). Per-patient calibration with these features yielded leave one subject out (LOSO) AUC = 0.68 to 0.76 for nonlinear classifiers (permutation p = 0.002), with HbO-specific feature selection achieving the best performance (RF AUC = 0.760); a healthy tooth negative control was non-significant. End to end deep learning on raw time series (CNN LSTM AUC = 0.719) was competitive with feature based classifiers, while linear models did not reach significance. Critically, head to head comparison of within-session CV and LOSO on the same data revealed mean inflation of +0.13 AUC across all model types, including deep learning, demonstrating that high within session accuracy alone does not establish subject-independent validity. Exploratory analyses suggested complementary roles for oxyhemoglobin (HbO; within patient analgesia detection) and deoxyhemoglobin (HbR; cross patient information), and that trial to trial response variability may complement amplitude for cross patient pain detection. These results show that per patient calibration with temporal HRF features supports subject independent analgesic-state detection under strict LOSO evaluation, and that within-session validation (standard in the fNIRS pain- decoding literature) can substantially overestimate performance.

15
Hierarchical organ aging signatures from routine abdominal CT add incremental disease risk stratification beyond blood biomarkers

Deng, Z.; Wang, Y.; Shi, Y.; Wang, L.; Qureshi, T. A.; Gaddam, S.; Javed, S.; Hsu, Y.-C.; De Righi, D. R.; Azab, L.; Diwan, G.; Yang, J. D.; Xie, Y.; Yuan, C.; Vendrami, C. L.; Rodriguez, A.; Specht, K.; Jeon, C. Y.; Chaudhry, H.; Buxbaum, J.; Pisegna, J. R.; Yaghmai, V.; Goessling, W.; Hernandez-Barco, Y. G.; Miller, F. H.; Tirkes, T.; Espinoza, S.; Musi, N.; Dey, D.; Sung, K. H.; Pandol, S. J.; Li, D.

2026-05-27 radiology and imaging 10.64898/2026.05.19.26353206 medRxiv
Top 16%
0.0%
Show abstract

Biological aging is heterogeneous across organ systems, yet whether CT-derived abdominal aging provides prognostic value beyond routine clinical data and whether organ decomposition adds beyond a unified estimate remains untested. We developed and evaluated organ-specific and ensemble biological age models from radiomic features across five abdominal organs in 68,675 CT scans from 32,883 subjects, evaluated on alignment with chronological age of healthy subjects (nested cross validation: MAE=3.68 years, R^2=0.90). In sequential analyses restricted to adults aged 20-60 years which is the stratum of strongest BAG-disease association, ensemble biological age gaps provided incremental prognostic value beyond demographic covariates for all-cause disease and mortality (Delta C-index=0.141, 0.051) and beyond routine blood biomarkers (Delta C-index=0.048), confirming CT-derived aging captures structural information beyond laboratory markers. Organ-specific biological age added incremental prognostic value beyond ensemble selectively for focal diseases: cardiovascular (aorta, Delta C-index=0.091) and hepato-pancreatic (pancreas, Delta C-index=0.096). These findings establish a hierarchical organization of CT-derived biological aging, positioning routine CT as a source that adds prognostic value to existing clinical biomarkers.

16
Optical coherence tomography as a biomarker for frontotemporal dementia: a systematic review & meta-analysis

Wang, E.; Kohli, A.; Taha, H. B.

2026-05-27 neurology 10.64898/2026.05.19.26353366 medRxiv
Top 16%
0.0%
Show abstract

Background: Frontotemporal dementia (FTD) lacks widely accessible disease-specific biomarkers. Optical coherence tomography (OCT) and OCT angiography (OCTA) may provide non-invasive measures of retinal changes associated with neurodegeneration. We conducted a systematic review and meta-analysis evaluating retinal biomarkers in FTD compared with Alzheimer disease (AD) and controls. Methods: A systematic search of PubMed and Embase was conducted through April 25, 2026 according to PRISMA guidelines. Studies evaluating OCT/OCTA biomarkers in FTD with comparator groups were included. Inverse weighted random-effects models, publication bias assessments, and meta-regressions were performed. Results: Ten studies involving 139 individuals with FTD, 87 with AD, 29 with mild cognitive impairment, 14 with TDP-43 proteinopathy, 5 with tauopathy, and 255 controls were included in the systematic review; five studies were eligible for meta-analysis. Compared with AD, individuals with FTD demonstrated significantly thinner retinal nerve fiber layer (RNFL) thickness (SMD = -0.61, 95% CI -0.98, -0.24). Compared with controls, individuals with FTD exhibited significantly thinner ganglion cell layer-inner plexiform layer (GCL-IPL) thickness (SMD = -0.55, 95% CI -1.02, -0.08), whereas pooled analyses across multiple retinal biomarkers were non-significant (SMD = -0.19, 95% CI -0.52, 0.14). RNFL thickness correlated negatively with female % in FTD and positively with age in both AD and controls. Conclusions: Individuals with FTD exhibit lower RNFL thickness than AD and lower GCL-IPL thickness than controls, suggesting retinal alterations may reflect neurodegeneration. However, larger longitudinal studies with standardized OCT/OCTA protocols are needed to determine the diagnostic and prognostic utility of retinal biomarkers in FTD

17
Vaginal Antisepsis for Major Gynecologic Surgeries Using Chlorhexidine Gluconate versus Povidone Iodine: A Systematic Review and Meta-Analysis

Dias, Y.; Gebrekidan, F.; Lowder, J.; Sutcliffe, S.; Yaeger, L.

2026-05-27 obstetrics and gynecology 10.64898/2026.05.26.26353429 medRxiv
Top 16%
0.0%
Show abstract

ABSTRACT OBJECTIVE: We performed a systematic review and meta-analysis (SRMA) of post-surgical outcomes, comparing chlorhexidine gluconate (CHG) versus povidone iodine (PI) for vaginal antisepsis of major gynecologic procedures. DATA SOURCES: Ovid Medline, Embase, Scopus, Embase, Cochrane, and Clinicaltrials.gov were searched between 1986 and December 2023, for studies comparing CHG with PI for vaginal antisepsis of major gynecologic operations. STUDY ELIGIBILITY CRITERIA: We included Randomized Controlled Trials (RCTs) and non-RCTs comparing CHG to PI for vaginal antisepsis of major gynecologic operations. The primary outcome was surgical site infections (SSIs) and the secondary outcome was urinary tract infections (UTIs) and vaginal irritation. METHODS: Summary estimates were calculated by fixed effects models when I2 [&le;] 25% and by random effects models when I2 > 25%. Statistical analysis was performed using RevMan 5.4.1. The protocol for this systematic review was registered on PROSPERO (ID CRD42022378101). RESULTS: Nine studies met the inclusion criteria, four of which were randomized controlled trials (RCTs). 9538 patients were included, 4300 (45%) of whom were allocated to CHG and 5238 (55%) to PI. No statistically significant difference in SSI incidence was found for vaginal antisepsis with CHG versus PI in pooled analyses (n= 9538 patients; RR 1.20; 95% CI 0.92-1.57; I2 =0%). In contrast, a significantly higher risk of UTIs was observed for vaginal antisepsis with CHG than with PI (n=6061 patients; RR 1.48 95% CI 1.03-2.14; I2 = 0%). CONCLUSION: In our SRMA, there were no significant differences in SSI risk when either CHG or PI was utilized for antiseptic vaginal preparation. Interestingly, vaginal antisepsis with PI was associated with a lower incidence of post-operative UTIs following major gynecologic surgery. Our findings support current guidelines that form of vaginal antisepsis can be used for SSI prevention. They also suggest that PI may result in fewer postoperative UTIs but further randomized studies are needed to support these findings. Key words: surgical site infection, surgical wound infection, urinary tract infection, urogynecologic surgery, Chlorhexidine, Povidone Iodine, surgical antiseptic,

18
An ECG foundation model for generalizable cardiac function prediction across the lifespan

Yang, Y.; Peracchio, L.; Mayourian, J.; Miller, T.; La Cava, W.

2026-05-27 health informatics 10.64898/2026.05.26.26354128 medRxiv
Top 16%
0.0%
Show abstract

Background Artificial intelligence-enhanced electrocardiography (AI-ECG) enables scalable, low-cost cardiac dysfunction screening, but existing models are annotation-intensive and predominantly adult-derived, leaving paediatric generalizability uncertain. Paediatric cohorts exhibit highly variable cardiac morphology and function compared to adults, which may be useful for learning generalizable AI-ECG models. Methods We pretrained ECG-Fyler on a predominantly paediatric, all-age cohort at Boston Children's Hospital (1992-2023), annotated with a cardiology-specific coding system (Fyler codes), and evaluated it on assessments from echocardiography (echo) and cardiac magnetic resonance (CMR) studies. We validated on an external adult cohort from Columbia University Irving Medical Center. Performance was benchmarked against several AI-ECG foundation models by AUROC across age groups, lesion types, and limited-data scenarios. Findings The pretraining cohort comprised 782,138 ECGs from 255,271 patients (median age: 10.9 years, IQR: [2.8-16.8]). Internal evaluation included 178,495 ECG-echo pairs (median age: 10.9 [3.7-17.0]) and 8,584 ECG-CMR pairs (median age: 20.7 [15.6-29.6]). External validation included 82,543 ECG-echo pairs from adults (median age: 64.0 [52.0-74.0]). ECG-Fyler improved AUROC across biventricular dysfunction and dilation tasks, with the largest gains in low-data settings. In internal validation, ECG-Fyler detected low left ventricular ejection fraction (LVEF [&le;] 40%) from only 100 fine-tuning samples (AUROC: 0.80, 95% CI: [0.78-0.80]), outperforming other models (AUROC < 0.65) and improving with additional fine-tuning (AUROC: 0.94 [0.93-0.94]). Similar improvements were observed for CMR-derived LVEF, RVEF, and ventricular dilation. In external validation on adults, ECG-Fyler exhibited an AUROC of 0.83 (CI: [0.82-0.85]) for LVEF [&le;] 40%. After fine-tuning on less than 10% of external data, LVEF [&le;] 45% performance (AUROC: 0.87 [0.86-0.88]) outperformed a fully trained, site-specific prior model (AUROC: 0.85 [0.84-0.87]). Interpretation Pretraining on richly annotated, paediatric-dominant ECGs yields models that transfer efficiently across institutions and ages, supporting AI-ECG screening and triage when labels or imaging access are limited. Funding National Institutes of Health (R01LM012973); Kostin Innovation Fund, Boston Children's Hospital

19
Patient Versus Prediction-Level Evaluation of a Dynamic Clinical Prediction Model of Sepsis

Tuttle, M.; Maas, C. C. H. M.; An, J.; Wessler, B. S.; Harvey, W. F.; Selker, H. P.; van Klaveren, D.; Kent, D. M.

2026-05-27 health systems and quality improvement 10.64898/2026.05.26.26354141 medRxiv
Top 16%
0.0%
Show abstract

The Epic Sepsis Model version 2 (ESMv2) is a prediction model embedded into the electronic medical record used to warn clinicians which hospitalized patients are at risk for sepsis. We conducted a retrospective cohort study of 31,951 hospitalizations of 25,760 patients to compare analyses conducted at the commonly used patient-level (where a maximum prediction prior to the onset of sepsis is used to measure performance) vs novel prediction-level (where each prediction is used to measure performance). Sepsis, defined by the Sepsis 3 criteria occurred during 1,049 hospitalizations (3.3%). Patient-level analyses suggested excellent discrimination AUC 0.86; [IQR 0.85, 0.87], whereas prediction-level analyses demonstrated lower performance AUC 0.62; [IQR 0.57, 0.65]. Low estimates of the positive predictive value (14.5% at the patient level vs 4% at the prediction level) imply a high number of false alerts. Common evaluation approaches may overstate the performance of dynamic prediction models and mislead clinical decision-making.

20
Grounding Language Models in Behavioral Science to Scale Physical Activity Interventions for Hispanic/Latinx Populations

Mantena, S. D.; Johnson, A.; Schuetz, N.; Tolas, A.; Montalvo, S.; Delgado-SanMartin, J.; Ramirez Posada, M.; Du, L.; Zhang, S.; Huynh, A. D.; Oppezzo, M.; King, A. C.; Schmiedmayer, P.; Lawrie, A.; Rodriguez, F.; Ashley, E.; Kim, D. S.

2026-05-28 cardiovascular medicine 10.64898/2026.05.26.26354165 medRxiv
Top 16%
0.0%
Show abstract

Objective: Hispanic/Latinx populations in the U.S. experience higher rates of chronic disease linked to physical inactivity, yet digital health interventions remain largely inaccessible to more than 16 million Hispanic/Latinx adults with limited English proficiency. While large language models (LLMs) offer scalable personalization, their use in non-English behavioral coaching is unexplored. This study introduces MHC-Coach-ES, a Spanish-language LLM fine-tuned on the Transtheoretical Model (TTM) of behavior change. Materials and Methods: We fine-tuned Llama 3-70B-Instruct using a two-stage pipeline. First, the model was adapted to Spanish health and motivational language using a 2.21-million-token corpus. Second, it was instruction-tuned on 3,268 translated human written messages to align the model with the Transtheoretical Model (TTM) of Behavioral Change. We compared MHC-Coach-ES with Llama 3-70B-Instruct and translated human-expert messages using a forced-choice preference survey (N = 77) and blinded expert review (N = 2). Results: Spanish-speaking participants significantly preferred MHC-Coach-ES messages over translated human-expert messages (81% preference, P<0.001). Linguistic analysis showed that MHC-Coach-ES produced more temporally anchored messages than the base model (65% vs. 20%), while maintaining readability. In blinded evaluation, clinical experts rated MHC-Coach-ES higher for alignment with Transtheoretical Model stages than human-expert messages (4.83 vs. 4.38 out of 5). The base model also outperformed translated expert messages across preference and expert ratings. Conclusions: Generative AI can operationalize behavioral science frameworks in Spanish, offering a scalable approach to reducing health disparities. The strong performance of both MHC-Coach-ES and the base model highlights the promise of generative and personalized approaches over translation-based localization for theory-driven behavioral interventions.